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Abstract 

Much recent attention has been paid to quantifying anatomic and functional neuroimaging on the 
individual subject level. For optimal individual subject characterization, specific acquisition and analysis 
features need to be identified that maximize inter-individual variability while concomitantly minimizing 
intra-subject variability. Here we develop a non-parametric statistical metric that quantifies the degree 
to which a parameter set allows this individual subject differentiation. We apply this metric to analyzing 
publicly available test-retest resting-state fMRI (rs-fMRI) data sets. We find that for the question of 
maximizing individual differentiation, there is a relative tradeoff between increasing sampling through 
increased sampling frequency or increased acquisition time; that for the sizes of the interrogated data 
sets, only 4-5 min of acquisition time is necessary to perfectly differentiate each subject; and that brain 
regions that most contribute to individuals unique characterization lie in association cortices thought to 
contribute to higher cognitive function. These findings may guide optimal rs-fMRI experiment design 
and may aid elucidation of the neural bases for subject-to-subject differences. 


1 Introduction 

Many neuroimaging studies seek to use inter-individual variability in anatomic and functional characteri¬ 
zation to gain insight into concomitant variability in a behavioral or clinical feature of interest [1; 2]. In 
addition, just as a psychologist may use a patients behavioral score to predict an outcome or guide treatment, 
there has been interest in developing methods and standards that allow for individual patient anatomic and 
functional characterization using neuroimaging [3], 

Recent studies have indeed identified significant inter-individual variability in behavioral, anatomic and 
functional parameters, and determined that these variables correlate in significant and intriguing ways. Kanai 
and Rees catalogued numerous examples of studies where particular behavioral traits can be predicted by 
individual subject level region-specific anatomic (DTI, VBM) or functional (BOLD fMRI, PET, MEG, 
EEG, MRS) measures [1]. Additionally, Mueller et al. demonstrated that there is significant inter-individual 
variability in functional connectivity assessed with resting state fMRI (rs-fMRI), and that regions of high 
variability correlate with regions of evolutionarily recent cortical expansion, as well as regions thought to 
determine higher cognitive function [ ]. 

For studies that seek to develop a functional characterization of an individual subject, specific acquisition 
and analysis features can to be identified that maximize this inter-individual variability while minimizing 
intra-subject variability. We therefore sought to analyze rs-fMRI data for factors that affect this individual 
subject differentiation, given the emergence of rs-fMRI as a powerful tool for both neuroscience and clinical 
application [5-7]. In particular when compared to task based fMRI, rs-fMRI has received interest as a 
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clinical tool due to (i) potentially low individual variability, (ii) a lack of dependence on subject compliance 
for a particular task, and (iii) potentially lower acquisition times [8]. The lack of dependence on subject 
compliance is particularly important in certain patient classes, such as those with functional deficits related 
to brain lesions, and patients who may not be able to understand the task due to cognitive dysfunction, 
language barriers, or in the pediatric population. 

There are important differences between analysis of task-based and resting-state fMRI data. Without a 
task with which to correlate, pairwise similarities between regions or voxel timecourses are computed instead 
of a univariate analysis of the time-series. Additionally, a whole-brain or network based approach is often 
utilized as there is no prior knowledge to regions of interest. Typically in rs-fMRI connectivity studies, 
a parameter of similarity (e.g. Pearson correlation, partial correlation, spectral coherence) is calculated 
between pairs of timecourses from given networks or regions of interest. These similarity data are organized 
into an adjacency matrix, one calculated per each subject or scan. Fundamentally, each summary statistic, 
such as “global efficiency”, calculated in a rs-fMRI study is a calculation of the data in these adjacency 
matrices [)]. 

While group-level reproducibility of rsfMRI is validated for several specific ROIs and networks [10-15], 
an analysis of whole-brain subject level variability has only recently been described [ ]. Here, we focus 
on acquisition and parcellation factors that most affect the measured inter-individual and intra-individual 
variation of whole-brain adjacency matrices between test and retest sessions. We develop a non-parametric 
statistical metric that allows us to quantify the degree to which a given acquisition and analysis scheme 
maximizes inter-individual variability while minimizing intra-individual variability. We then use this metric 
to determine which factors appear to most contribute to individual subject differentiation. 


2 Methods 

We would like to utilize a formal notion of reliability that will enable to compute reliability of the data in 
a justified and principled fashion, and be able to compare the reliability of different procedures. Figure 1 
shows a schematic illustration of our method, which we define formally below. The data that we analyze 
are subject to many different sources of variability including (i) biological, (ii) scan acquisition, and (iii) 
graph inference reliability. We treat each separately, as each is admissible to a different set of properties 
and manipulations. Minimizing biological variability is beyond the scope of this work, we therefore focus 
on understanding and analyzing variability of the scan acquisition and graph inference. That said, because 
biological variability is clinically useful and important variability, we begin there. Let B(-) : fl —> B be a 
“brain valued” random variable, where indexes both the subject i and the time t of the acquisition. Thus, 
Bi t € B is the brain of subject i at time t. For brevity, we define the index k := k(i,t) to indicate brain i 
at time t. 

We will study the reliability of these data via analysis of their inferred functional connectomes. Here, we 
define a functional connectome to be a weighted graph estimated from fMRI data. Let G v = {V,E) € Q v 
be a graph with \V\ = v vertices and E C V x V is the collection of edges amongst them. In an abuse 
of notation, let G v = (V,E,W) also denote a weighted graph, where W = {w,jj } is set of weights. The 
weighted adjacency matrix (or adjacency matrix for short), i £ R” x ” is a ti x » matrix whose elements 
are 0 when there is no edge between vertices i and j, and a^- = Wij when there is an edge between i and j. 
Note that these graphs are undirected, so that = aji. Next we explain how we estimate these functional 
connectomes. 

2.1 Scan Acquisition Methods 

Formally, let S m (-) : B —> X be the m th acquisition function that takes as input a real brain at some time, 
and outputs some data A™ = S m (Bk) £ X. The index m here indexes different scanners and/or acquisition 
protocols. Note that it cannot be the case that we have both X™ and Xjjj, because k indexes time as well, 
so we can drop the superscript for brevity without loss of generality. We consider four different acquisition 
functions as described as follows: 
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1. KKI: Often referred to as “KIRBY21”, KKI is a collection of 21 subjects, each scanned twice at 3T 
for 7 minutes, TR 2.0 sec [16]. One subject was excluded due to data quality. 

2. NKI Standard: A total of 23 subjects acquired at 3T using single band protocols for 5 minutes, TR 
2.5 sec [17], 

3. NKI Multiband TR=1.4 sec: Same subjects and scanner as above, but using a multiband sequence, 
and a faster TR of 1.4 sec [17]. 

4. NKI Multiband TR=0.645 sec: Same as above, but with TR of a faster 0.645 sec [17]. 

Thus, in total, S m consists of the scanner sequences and pre-processing routines, for m € [4] = {1, 2, 3,4}. 
The output of any such S m is a multivariate time-series, X m £ X C M. PxNrn , where P is the number of 
voxels in the brain, and N is the number of time steps for procedure to. We refer to each such observation, 
Xk , as a “brain scan”. 

2.2 Graph Inference Methods 

Let Tp(-) : X —» Q be the p th graph inference routine, that estimates the brain-graph (or connectome) from 
a brain scan. The graph inference procedures that we consider consist of a sequence of steps. 

Shared Pre-processing: After the data came off the scanner, we conducted the following pre-processing 
on each scan independently. Each BOLD time-series underwent a standard preprocessing sequence of slice 
timing correction, motion co-registration, spatial normalization to the MNI152 2nnn template, detrending via 
high pass filtering, Principal Component Analysis-based nuisance and motion parameter regression (following 
[ 8]), bandpass temporal filtering (0.1-1.0 Hz) and spatial smoothing (6mm Gaussian kernel). Note that the 
multiband data sets did not undergo slice timing correction. All analysis was completed in MATLAB 
(Mathworks, Natick, MA) and SPM8 (Wellcome Trust, UK). 

Parcellation Scheme: Two parcellation schemes were tested. For each, the target ROI number for each 
parcellation scheme was varied in powers of 2 from 128 to 2048. Letting V be the set of voxels, we define a 
partition by C clusters Pi, ..., Vc , such that V c D V c > = for all c^c', U C P C = V For the first parcellation 
scheme, the gray matter was subdivided into uniform sized ROIs (maximum size difference between ROIs 
of one voxel) [L9]. Thus, \V C \ ~ Kq for all c, where | • | is the set cardinality operator and Kc is the 
approximate number of voxels per cell in the partition. For the second scheme, for the functional type, 
we utilized a recently published scheme based on clustering rs-fMRI data. The goal of this method was to 
cluster voxels with highest effective intra-ROI connectivity versus inter-ROI connectivity, while maintaining 
spatial proximity [20]. The fMRI data used for generating these parcellations were from three subjects from 
a different, publicly available data set included in the distribution of the published parcellation code. 

Timecourse Extraction Method: Regardless of the parcellation scheme used, let X£ = {Xk(p)} p ^v c € 
RlWIxWr, the submatrix extracting just the rows corresponding to voxels in cell V c . Dropping the 
subscript k for brevity, we consider two approaches to obtaining univariate time-series from each such ma¬ 
trix. First, the mean of the ROI time-series, we compute by x c mean = yp-\ -^ c , where x^ ean = 

( x mean( 1), • • ■, X mean{^m)) £ K Wra . Second, let [D c , U c ] be the eigenvalues and eigenvectors of X c . Let x c eig 
be the projection of X c onto its first principal component, that is, the eigenvector associated with the largest 
positive eigenvalue. Finally, let £ R. CxN m the dimension reduced time-series corresponding to the 

time-series from observation k with C ROIs using ip £ {mean, eig}. 
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Graph Inference: Consider the matrix Yj, = {yiy, where yi,t is the i th ROI at time t. To obtain our 
connectome estimate, we compute the Pearson correlation matrix: 


Wu = 


NmE 


te[N m ] 


-N„ 


ViVj 


(-N m 1 ) s i s j 


(1) 


where jji and are the sample mean and standard deviation for ROI i. Finally, we let a^- = if Wij > r, 
and ciij = 0 otherwise, for some threshold r. We only allow r to be bigger than —oo for Figure 4 (bottom), 
where we consider a range of r’s. Thus, the graph of observation k will be indicated by the adjacency matrix 
Ak = {fly}, and we will use these estimates to conduct our reliability analysis. 


2.3 Reliability Analysis 

2.3.1 Rank Sum Statistic 

Let q = (p, m) index acquisition and graph inference pair procedure, so that V q =T p o S m . The goals of this 
reliability analysis are to: 

1. quantify the reliability of various choices procedures V qi and 

2. evaluate whether any V q 's under consideration are sufficient. 

We consider a non-parametric, order statistic based strategy. This strategy has a number of advantages 
over classical approaches, such as intraclass correlation [21]. First, it can operate on multivariate, and even 
non-Euclidean observations, rather than only real-valued observations. Second, it is model-free, in that it 
does not make any assumptions about the distribution of the data. Third, because it is based on order 
statistics, it is robust to many kinds of artifacts, such as spurious or missing data. 

To proceed, let <5... : Q xQ —>• 1Z + be a distance metric between a pair of graphs. Based on this metric, we 
defined an order statistic for each observation. If the data are reliable, then is relatively small 

for all i. Let Rl be the rank score of observation k for procedure q defined as follows. For each observation 
k. we sort the observations in decreasing order, such that Sky i) < Sky 2 ) < • ■ • < Sky n -i)- Recalling that k 
indexes a subject and time pair, is the rank of observation i. t' with respect to observation i, t. 

Ideally, the set of R\' s are small for a given procedure. We would like to use these order statistics to 
assess the reliability of different acquisition and graph inference functions. We define the reliability of V q as 
the sum of ranks of all graphs inferred for that procedure. In other words, we let lZ q = Note that 

n < 1Z q < n(n — 1), so this reliability metric is closed and bounded. 

To complete the description of the reliability statistic, we must choose a graph distance metric, S. To be 
in alignment with the neuroimaging community, we choose the Frobenius norm between weighted adjacency 
matrices: 


5(G k ,Gk>) = \\A k -A k ' |||, 


( 2 ) 


where ||A|| F = 



2.3.2 Permutation Testing 

To assess the reliability of the data, we start by assuming that each graph is sampled independently from 
some distribution, A l t ~ F l t £ T . If each F i t = F is the same, then we could not hope to distinguish 
individual subjects. However, if the data from each subject were sampled from its own distribution, that is, 
Ai it ~ Ft, assuming that each Fj is sufficiently different from the others, then the data would be reliable. 
Let A: Jx J-> 1Z + be a metric computing the distance between densities. Let A (F q ,F q ) be the distance 
between distribution F q and F q , where q is indexing the procedure for obtaining the graphs, as described 
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above. The test statistic we consider is therefore A g = max^j/ A A reliability analysis can then be 

thought of in terms of the following hypothesis test: 

H 0 :A, > e 

H a :A, < e. 

Rejecting the null in favor of the alternative implies that we believe that procedure V q is significantly more 
reliable than chance. We can estimate the null distribution via the following permutation test. For B Monte 
Carlo simulations, we permute the labels, such that (i,t) —► ( i',t '). Because we lack the actual densities 
{i 7 )}, and because estimating high-dimensional densities is a challenging computational statistics problem, 
we utilize lZ q as a surrogate test statistic. Given the permuted data, it is easy to compute 7for b £ [R], 
and therefore obtain an empirical estimate of the distribution of lZ q under the specific null where e = 0. 


3 Results 

3.1 Stability of Inter-Individual Differences 

A nonparametric rank sum metric was used to compare the ability of different acquisition and parcellation 
variable values to distinguish individual subjects (Methods; Figure 1). The benefits of using this statistic for 
comparison include having an absolute minimum (the number of scans in the data set) and easy to define 
statistics based on random permutation of pairings. This metric is minimized by factors that allow maximal 
differentiation of individual subjects, with greatest reproducibility between test and retest- in an optimal 
case the ranks of adjacency matrix distances between test and retest should be 1 for all test-retest pairs, 
with the rank sum being equal to the number of scans in the data set. 

We used minimization of this rank sum to define individual subject differentiability. For each analysis 
and acquisition parameter that was varied, several general trends were seen (Figures 2 and 3). As expected, a 
longer acquisition time and a higher number of ROIs in the parcellation allowed greater ability to differentiate 
between individual subjects, with acquisition parameters (such as length of acquisition and TR) having 
greater effect. 

Acquisitions with shorter TR tended to allow greater individual subject differentiation, with the multi¬ 
band acquisition data sets producing lower rank sums compared to standard acquisitions for the same 
acquisition duration. For the data sets analyzed, between 7-10 min of acquisition time was sufficient to 
minimize the resultant rank sum suggesting that acquisitions of a longer time period would produce only 
marginal results regarding individual subject characterization, for the number of subjects in these data sets 
(n=20 or 23, Figures 2 and 3). 

To determine whether the effect of TR on minimum rank sums was purely a function of increased sampling 
versus acquisition time, the data were organized by number of data points acquired, instead of by real time 
(Figure 4, top). These data seem to indicate that increased sampling frequency alone does not ensure lower 
rank sums. Instead, data sets with smaller TR tended to high rank sums when the number of data point 
were kept constant, suggesting an apparent trade-off of increased sampling frequency and longer acquisition 
times [22]. 

As thresholding of the adjacency matrices is commonly utilized to reduce data dimensionality and elim¬ 
inate likely noisy data, the minimum rank sum was calculated for each adjacency matrix after eliminating 
correlations below a percentile threshold (i.e. at a “25%” threshold, all correlations below the 25th largest 
percentile of correlations were set to 0). With this analysis (Figure 4, bottom), the data set with the least 
sampling (NKI, standard acquisition with TR=2500 ms, 5 min acquisition) showed a steady improvement of 
subject differentiation with increasing thresholding. However, for data sets that otherwise achieved or were 
close to the minimum rank sum, there was no such effect. 
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Figure 1: Minimum rank sum statistic enables non-parametric comparisons of test-retest acquisition and 
analysis parameters. To assess analytic efficiency in differentiating individual subjects, first adjacency ma¬ 
trices were made for each scan where each matrix element corresponded to the correlation between time-series 
extracted from the indicated pair of regions of interest (ROIs; left). Next, for each dataset, a distance matrix 
was calculated where each element corresponded to the Euclidean distance (square root of sum of square 
differences for individual matrix elements) between the indicated pair of adjacency matrices (middle). Then, 
a rank matrix was calculated where each element corresponds to the rank of the distance between the in¬ 
dicated pair of scans compared to the set of distances between that rows scan and each other scan in the 
dataset. Note, the rank matrix is not necessarily symmetric (right; minimum rank set to 1). 
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Figure 2: Varying acquisition time and parcellation for each test-retest data set. Top: Sum of distances 
between test-retest scan pairs versus acquisition time, for varied number of regions of interest in each par¬ 
cellation (data using a functional parcellation scheme [20] and eigenvariate time extraction). Dashed line is 
the mean ± sem calculation for a set of 1000 randomly assorted pairs for the maximum ROI parcellation. 
Bottom: Sum of ranks of test-retest scan pairs for the indicated data set by ROI, using the same conditions 
as the Top row. Black line at bottom is the minimum possible rank sum. 
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Figure 3: Varying methods of time-series extraction and parcellation generation for each test-retest data 
set. Top: Sum of test-retest scan pairs comparing method of timecourse-series extraction, plotted similar 
to Figure 2. Bottom: Sum of test-retest scan pairs comparing method of parcellation generation, plotted 
similar to Figure 2. 
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Figure 4: Factors affecting achieving 
the minimum rank sum. Top: Rank 
sums of test-retest pairs for each data 
set as a function of number of data 
points acquired. Bottom: Adjacency 
matrices were thresholded such that 
edges with correlation less than the in¬ 
dicated percentile were set to zero. Val¬ 
ues presented are for the indicated data 
set processed using the functional par- 
cellation of 1920 ROIs and using eigen- 
variate time-series extraction. 













Table 1: A genetic algorithm perfectly sorts true test-retest pairs using minimal acquisition time by mini¬ 
mizing the rank sum metric, with minimal variation with choice of parcellation. Elements of the table show 
the minimum time to perfectly sort scans from the indicated datasets and parcellation schemes. 


Parcellation 

functional 


uniform 


Dataset 

1000 ROIs 

1920 ROIs 

1024 ROIs 

2048 ROIs 

KKI (TR=2.0s) 

3 

3 

3 

3 

NKI (TR=2.5s) 

4 

5 

4 

5 

NKI (TR=1.4s) 

3 

3 

4 

3 

NKI (TR=0.645s) 

5 

5 

5 

4 


Table 2: A genetic algorithm perfectly sorts true test-retest pairs using minimal acquisition time by 

minimizing the rank sum metric, with minimally longer times needed with higher numbers of subjects. 
Minimum time for an unsupervised genetic algorithm to use test-retest rank sum minimization to perfectly 
sort scans from the indicated datasets into individual pairs, by number of subjects in the group (N). When 
varying N below the maximum of the data set, the set of subjects was randomly chosen from the total data 
set 20 times; presented are median values. 


Dataset 

N=10 

N=15 

N=20 

N=23 

KKI (TR=2.0s) 

3 

3 

3 

N/A 

NKI (TR=2.5s) 

4 

4 

5 

5 

NKI (TR=1.4s) 

2 

2 

3 

3 

NKI (TR=0.645s) 

2 

2 

4.5 
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3.2 Localization of Inter-Individual Differences 

To explore the question of the acquisition time necessary to differentiate individual subjects by their non 
thresholded similarity matrices, an unsupervised genetic algorithm was utilized to sort scans into their test- 
retest pairs using the rank sums as the optimization metric, without including any information other than 
the similarity matrices. Indeed, only 3-5 min of acquisition time was necessary to perfectly sort up to 23 
subjects without any ad hoc subject labeling (Tables 1 and 2). Choice of parcellation type produced only 
marginal differences with respect to time necessary for perfect pairing of subjects, as did number of ROIs in 
the parcellation beyond ~1000. 

To further determine the brain regions and connections that contribute to this individual subject differen¬ 
tiation, for each dataset the rank sum metric was calculated for differences of each element of the adjacency 
matrix (using the functional parcellation, with the maximal number of ROIs, and eigenvariate time-series 
extraction). The brain regions containing the highest proportion of matrix elements with the lowest rank 
sum (<5th percentile) were determined (Figure 5). These regions were found to lie in secondary, association 
cortices including the parietal and prefrontal cortices and not in primary motor or sensory cortices. 


4 Discussion 

Using the rank sum metric we describe (Figure 1)— a nonparametric test of determining ability of a specific 
acquisition/analysis parameter set to differentiate individual subjects—we see several general trends. Perhaps 
unsurprisingly, more data allows for generally better individual differentiation (Figures 2 and 3). For instance, 
longer acquisition times yield lower rank sums as do sequences with smaller TR. This effect seemed largely 
due to increased number of samples, although TR seemed to somewhat modulate these results (Figure 4, 
top). From our results, it was not clear whether multiband acquisition alone had any advantages over 
standard acquisitions aside from the lower TR. Higher numbers of ROIs, with correspondingly lower size of 
individual ROIs, allowed for somewhat greater individual differentiation (Figure 2). This may be explained 
by less averaging together of dissimilar regions. Factors that had at most minor effect on individual subject 
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Figure 5: Map of brain regions that define unique identifiers of individuals Brain regions color coded by 
greatest number of connections with the lowest (<5th percentile) test-retest rank sum. Warmer colors code 
to higher individual differentiation (higher number of connections that have low test-retest rank sum). 
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differentiation were (i) choice of parcellation strategy, and (ii) time-series extraction method (Figure 3). 

For relatively undersampled datasets, thresholding the adjacency matrices (likely removing predomi¬ 
nantly noise contributing elements) allowed for greater individual subject differentiation (Figure 4, bottom). 
However, this effect was not consistent with data sets with lower TR, suggesting that increased sampling 
alone could optimize the attainable inter-individual differentiation. It was surprising that apparently limited 
amounts of data could allow for robust individual subject differentiation (Tables 1 and 2). With only 2 
min of BOLD acquisition time, using the multiband data sets, an unsupervised algorithm could reliably and 
perfectly sort into the appropriate test-retest pairs up to 15 subjects—a typical value per group for many 
fMRI experiments. This sorting was able to occur even though the algorithm had no labels for which scan 
corresponded to which patient or even whether a particular scan was the initial test or the second, retest 
scan. The amount of acquisition time necessary for reliable, perfect, fully unsupervised sorting of scans 
increased for increasing numbers of subjects, up to only 5 min necessary for the full n=23 data sets. These 
results underscore that indeed rs-fMRI data alone contain sufficient information to robustly differentiate 
individual subjects and allow for analysis of the factors that contribute to individual subject uniqueness. 

From the analysis of the brain regions and connections that most contribute to individual subject dif¬ 
ferentiation several interesting features were identified (Figure 5). First, primary sensory cortices and deep 
grey matter structures do not appear to contribute especially to individual subject differentiation. This is 
explained by the fact that such regions likely have an invariant functional anatomy and connectivity from 
person to person and therefore would not enable differentiation between individuals. In contrast, the regions 
that appear to contribute most to individual subject differentiation are found in association and secondary 
cortices in the prefrontal cortex, the precuneus and parietotemporal cortices. These latter cortical regions 
are thought to have undergone evolutionarily recent cortical expansion, supporting their putative role in 
higher cognitive processes [23]. Indeed, it is most likely that unique features of individuality would lie within 
these latter regions that are thought to contribute to higher-level association and conceptualization, and 
therefore more likely to be dependent on an individual’s personality. 

Of note, these latter regions are thought to comprise much of the default mode (DMN), attention (ATT) 
and executive control (EC) networks [24]. These networks have been implicated in a heterogeneous array 
of interesting effects in the rs-fMRI literature. Our findings further suggest that these networks are the 
highest signal regions for determining the pertinent functional connectivity for an individual subject. As 
these regions display the greatest inter-subject variability, our findings warrant caution in interpretation 
of results that may average together functional connectivity statistics for these networks across a group of 
varied individuals. 

It is unclear how much the step of normalization to an anatomic standard (completed as part of standard 
preprocessing) may affect these results and inform our determination of inter-individual differences. Cur¬ 
rently such anatomic warping is a standard practice in processing both task-based and resting-state fMRI 
[9]. It is possible that such warping may impart a signal in the derived functional connectivity that allows 
for individual subject differentiation based more on anatomy as opposed to fluctuations in neurovascular 
coupling. However, given that the typical native resolution of the BOLD imaging (~3 mm isotropic) is ~25 
fold less than the typical native resolution of the anatomic T1 acquisitions (~1 mm isotropic), we find it 
unlikely that the BOLD acquisition would have spatial resolution to distinguish these normal subjects based 
significantly on anatomy. There are no known methods for exactly comparing functional connectivity graphs 
of unwarped brains in such a manner as we have completed here. Indeed, this graph isomorphism problem 
is possibly NP complete for an exact solution [25]. Certainly, future work will seek to complete a similar 
analysis, but for BOLD data that is not spatially warped to an anatomic standard. 

Further avenues for research will be to further refine our map of brain regions and connections that are 
most stable across individuals versus within individuals. Additionally, it would be of interest to define a 
typical time length for this stability of individual functional connectivity patterns—in that certain regions 
and connections may vary across minutes, days, months or years depending on the changing cognitive state 
of the subject. 
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5 Conclusion 


In this study, we have developed a non-parametric measure to evaluate the degree to which a given acquisition 
and analysis scheme can differentiate individual subjects. Using this metric, we see that there is a relative 
tradeoff of increased temporal sampling through either lower TR or high acquisition times. We further find 
that only 4-5 min of acquisition time is necessary to perfectly differentiate individual subjects using the 
described, standard methods. We find that brain regions that most contribute to this individual subject 
characterization lie in regions of higher cognitive processing. These results have application in study design 
for the question of analyzing individual subject level determinants of behavior and in the clinical evaluation 
of individual patients. 
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